knitr::opts_chunk$set(
fig.align = "center", message = F, warning = F, cache = T, cache.lazy = F,
class.source = "fold-hide"
)
If packages are already installed on your system, you could just use library(tibble). Otherwise you have to install it first, here you have multiple options:
install.packages("insert_your_package_name")pacman::p_load("insert_your_package_name") This loads your package and installs it if not already present.Usually, the packages you install are retrieved from CRAN: https://cran.r-project.org/ but not all packages are hosted there. Especially for bio-related packages, there is Bioconductor: https://bioconductor.org/
To install packages from Bioconductor, you can either follow the instructions on the package site on Bioconductor (e.g. https://bioconductor.org/packages/release/bioc/html/ComplexHeatmap.html ) which includes installing the BiocManager package.
Also pacman also searches Bioconductor, but only if BiocManager is installed before!
Some packages are neither on CRAN nor on Bioconductor but on github. github is a platform to versionize and distribute code. To install github packages, you can:
devtools::install_github("JinmiaoChenLab/Rphenograph")pacman::p_load_gh("JinmiaoChenLab/Rphenograph")Additionally, you might need Rtools to build R packages: https://cran.r-project.org/bin/windows/Rtools/rtools40.html If you just installed rtools you will need to restart Rstudio.
if (!require(pacman)) {
install.packages("pacman") # If not already installed
}
pacman::p_load("tidyverse", install = TRUE)
pacman::p_load("cowplot", install = TRUE)
pacman::p_load("uwot", install = TRUE)
pacman::p_load("data.table", install = TRUE)
pacman::p_load("pheatmap", install = TRUE) # Todo
pacman::p_load("needs", install = TRUE)
pacman::p_load("knitr", install = TRUE)
pacman::p_load("ggridges", install = TRUE)
pacman::p_load("grDevices", install = TRUE)
# Install BiocManager such that pacman can then find ComplexHeatmap by its own
pacman::p_load("BiocManager", install = TRUE)
pacman::p_load("ComplexHeatmap", install = TRUE)
# Github packages have to be dealt in a special way
# However you install them, you _need_ the "devtools" package
pacman::p_load("devtools", install = TRUE)
# Additionally, you might need "Rtools"
pacman::p_load_gh("JinmiaoChenLab/Rphenograph", install = TRUE)
pacman::p_load_gh("JinmiaoChenLab/cytofkit", install = TRUE)
prioritize(dplyr)
count <- dplyr::count
theme_set(theme_classic())
In this tutorial we will analyse a public data set (Georg et al. 2021) of single cell phospho-proteomics measured with Cytometry by Time of Flight (CyTOF). In this study, the authors performed CyTOF of whole blood samples from mild and severe COVID-19 patients during the acute and convalescent phase, and patients with other acute respiratory infections (Flu-like illness), as well as patients chronically infected by human immunodeficiency virus (HIV) or hepatitis B (HBV) and healthy controls. They analysed the T cell space and identified highly activated CD16+ T cells in severe COVID-19, which led the authors to hypothesise about the pathological role of these cytotoxic T cells. This hypothesis was then tested and confirmed with functional analyses, and found suitable mechanisms for their induction. In this tutorial you will learn how to perform such computational analysis either in T cells, B cells, or monocytes!
Due to time constrains, we will analyse 5% of the data set. We will start by manually pre-gating the immune cell type of interest, then we will visually explore the data by reducing the dimensionality and plotting a UMAP. After this, we will cluster the data to find discrete communities of cells, using two different algorithms for comparison. Then we will annotate/give names to these clusters by looking at the average protein expression in each community/cluster. Finally, we will calculate the abundance of each cluster per donor and identify COVID-19 or severe-specific clusters.
We start reading the data table “data_norm_sub.csv” with the function read.csv(). * The data has already been pre-processed + calibration beads excluded (gating on Ce140 Bead channel) + doublets and debris excluded (gating on DNA channels and Event_length) + dead cells excluded (gating on Live-Dead mDOTA marker) + batch corrected (normalization using BatchAdjust method, linearly scales signal distributions to similar ranges using percentiles)
# results='hide' does not print messages to the generated output file
# Standard read.csv, works for most things, but fread is faster.
# data_norm_sub <- read.csv("~/Documents/INsTRuCT_workshop/data_norm_sub5.csv")
data_norm_sub_fread <- data.table::fread("../data/data_norm_sub5.csv")
# fread is a special format, such that we have to convert it first into a usual data.frame
data_norm_sub <- data.frame(data_norm_sub_fread)
What are the columns? What are the rows?
# Show the first 5 rows and all columns --> Rows are cells, columns are features per cell
print(data_norm_sub[1:5, ])
## cellid Run FCS.Filename id
## 1 14635738 200527 200527_Blut_Panel1_CV19_BC_10_viable.fcs HC-4, r200527
## 2 1131553 210130 210130_Blut_Panel1_CV19_BC_3_viable.fcs CV-133, month 3
## 3 10833199 200619 200619_Blut_Panel1_CV19_BC_1_viable.fcs HC-IJ, r200619
## 4 2711646 210129 210129_Blut_Panel1_CV19_BC_4_viable.fcs CV29, month 6
## 5 2706623 210129 210129_Blut_Panel1_CV19_BC_4_viable.fcs CV29, month 6
## Individuals Group Severity Disease.phase max..WHO.scale sev_merge
## 1 HC-4 HC healthy acute NA healthy
## 2 CV-133 CV19 critical convalescent 7 severe/critical
## 3 HC-IJ HC healthy acute NA healthy
## 4 CV29 CV19 moderate convalescent 4 mild/moderate
## 5 CV29 CV19 moderate convalescent 4 mild/moderate
## Days.post.symptom.onset Week sev_week followup
## 1 <NA> <NA> healthy <NA>
## 2 month 3 month 3 severe/critical, month 3 fatigue
## 3 <NA> <NA> healthy <NA>
## 4 month 6 month 6 mild/moderate, month 6 no-fatigue
## 5 month 6 month 6 mild/moderate, month 6 no-fatigue
## CD45 CD3 CD19 CD15 CD8 TCRgd CD62L
## 1 289.606079 150.939178 0.9167342 0.00000 242.0820007 0.00000 0.9683754
## 2 4.722483 0.000000 0.0000000 27.07735 5.3527999 0.00000 202.9331512
## 3 26.286074 0.000000 0.5317233 0.00000 0.6332178 0.00000 115.2458649
## 4 10.766818 9.825749 1.6423904 14.03760 20.0084972 14.71366 185.6015778
## 5 306.810120 94.136932 0.0000000 0.00000 37.2872429 0.00000 10.7689219
## CD45RO CD28 CD27 CD226 ICOS PD1 Lag3
## 1 3.203919 0.0998394 1.028889 32.7283173 1.215678 2.335296 0.31030640
## 2 74.573257 0.0000000 0.000000 1.5003730 0.000000 2.362798 2.02808976
## 3 38.330097 0.0000000 0.000000 1.0449100 0.000000 0.000000 9.55651665
## 4 37.360798 3.2174973 1.021154 0.3054930 17.149399 10.483729 0.07147276
## 5 47.494534 42.9645615 78.948204 0.3698978 0.000000 0.000000 0.00000000
## TIGIT CD96 CD25 CD56 HLADR CD38 CD137
## 1 2.4361346 1.660355 0.000000 33.5058823 0.0000000 1.3536147 0.000000
## 2 8.1984510 6.666277 0.000000 0.1530982 0.0000000 21.0363445 0.000000
## 3 0.0000000 5.227810 3.529121 0.0000000 0.0000000 10.1701736 0.000000
## 4 19.8741760 11.719555 3.712134 0.0000000 0.7540856 24.1808929 3.163666
## 5 0.1908786 2.546410 8.808489 0.0000000 0.0000000 0.2533335 0.000000
## CD69 Ki67 CXCR3 CXCR5 CCR6 CRTH2 KLRB1 KLRG1
## 1 1.734452 0.00000 169.36728 0 3.7821393 4.528113 0 51.88726
## 2 6.979673 17.71251 125.74521 0 23.3952847 2.280042 0 12.58200
## 3 0.000000 26.01557 122.19785 0 1.0721407 5.629048 0 24.32183
## 4 4.214287 97.56147 82.18497 0 0.9035059 26.126760 0 19.51483
## 5 0.000000 0.00000 104.22880 0 0.6558527 8.333822 0 4.27186
## KLRF1 CD95 CD10 CD16 CD34 CD123 CD11c CD21
## 1 5.042776 16.67289 0.52912545 0.0000 2.020787001 0.000000 8.559193 0
## 2 14.849868 25.53127 53.66223526 408.7447 19.129014969 5.730159 2.268054 0
## 3 0.000000 24.80001 57.93495560 297.4183 0.005866973 0.000000 3.638553 0
## 4 6.316540 31.96333 108.64481354 320.0549 5.801159382 4.664691 8.716877 0
## 5 12.600286 15.25747 0.06902131 0.0000 0.000000000 45.156952 6.199719 0
## CD14 IgD IgM
## 1 0.0000000 1.800159 7.0955362
## 2 0.2520287 1.032365 0.5208321
## 3 5.5783634 0.000000 0.0000000
## 4 1.9058144 7.746778 6.3418970
## 5 0.0000000 0.000000 16.6123581
# Then take a look at only the column names
colnames(data_norm_sub)
## [1] "cellid" "Run"
## [3] "FCS.Filename" "id"
## [5] "Individuals" "Group"
## [7] "Severity" "Disease.phase"
## [9] "max..WHO.scale" "sev_merge"
## [11] "Days.post.symptom.onset" "Week"
## [13] "sev_week" "followup"
## [15] "CD45" "CD3"
## [17] "CD19" "CD15"
## [19] "CD8" "TCRgd"
## [21] "CD62L" "CD45RO"
## [23] "CD28" "CD27"
## [25] "CD226" "ICOS"
## [27] "PD1" "Lag3"
## [29] "TIGIT" "CD96"
## [31] "CD25" "CD56"
## [33] "HLADR" "CD38"
## [35] "CD137" "CD69"
## [37] "Ki67" "CXCR3"
## [39] "CXCR5" "CCR6"
## [41] "CRTH2" "KLRB1"
## [43] "KLRG1" "KLRF1"
## [45] "CD95" "CD10"
## [47] "CD16" "CD34"
## [49] "CD123" "CD11c"
## [51] "CD21" "CD14"
## [53] "IgD" "IgM"
Not all features of the cell are actual cell markers. We want to subset the values to only these markers. Therefore, create a vector with the name of each measured protein (what we call “the CyTOF panel”).
panel <- colnames(data_norm_sub)[15:54]
# color_severity is a named vector which we will use later.
# #xxxxxx is a html-coded color code.
color_severity <- c(
"healthy" = "#0449FF",
"FLI" = "#807F7F",
"HIV" = "#40007F",
"HBV" = "magenta",
"mild/moderate" = "#FFB651",
"severe/critical" = "#F82000"
)
Let’s look at how the values of markers are distributed:
In the CyTOF community, people commonly use the hyperbolic arcsine (asinh) transformation: \[ \rm asinh(x) = \ln(x + \sqrt{x^2+1}) \]
data_norm_sub %>%
# take only 20% of the data such that the plots are generated faster
sample_frac(0.2) %>%
# Pivot: All columns defined by "panel" will go into two columns.
# The values go into the new column "value"
# The names of the columns go into the new column "marker"
pivot_longer(names_to = "marker", values_to = "value", panel) %>%
# start plotting, x-axis is the value
ggplot(aes(x = value)) +
# We want density plots, so how the values on the x-axis are distributed
geom_density() +
# facet_wrap splits the plots according to the specified column, here "marker"
# scale="free" tells that the x-axis and y-axis are not the same for each plot
facet_wrap(~marker, scale = "free")
We notice that these are skewed distributions: Many small values, some very large values. Therefore, it makes more sense to look at these on a logarithmic scale:
# Transform all columns defined by "panel" with asinh()
data_norm_sub_trans <- data_norm_sub %>%
# Apply asinh() on all columns defined by panel
mutate_at(vars(panel), asinh)
data_norm_sub_trans %>%
# take only 20% of the data such that the plots are generated faster
sample_frac(0.2) %>%
# Pivot: All columns defined by "panel" will go into two columns.
# The values go into the new column "value"
# The names of the columns go into the new column "marker"
pivot_longer(names_to = "marker", values_to = "value", panel) %>%
# Start plotting, define "value" as x-axis
ggplot(aes(x = value)) +
# Density plots, how are the values on the x-axis distributed
geom_density() +
# Split the plots according to the "marker" column, different axis
facet_wrap(~marker, scale = "free")
Using ggplot() and geom_point(), generate a scatter plot to decide the gates.
We visualize just 10% of the data.
data_norm_sub_trans %>%
# The plots would be overwhelmed with cells being exactly 0 (CyTOF-"ownness")
filter(CD45 > 0, CD3 > 0) %>%
# Use only 10% of the data
sample_frac(0.1) %>%
# Make a plot where the x-axis are the CD45 values, and the y-axis are the CD3 values
ggplot(aes(x = CD45, y = CD3)) +
# Each value will be a _point_ with a certain size
# and a certain transparency (alpha)
geom_point(size = 0.01, alpha = 0.1) +
# Additionally to the points, plot the density of the points in 2 dimensions
geom_density_2d() +
# Plot a rectangle with the given coordinates
# We found the coordinates by hand
# Alpha=0 because otherwise you would have a filled rectangle
geom_rect(mapping = aes(xmin = 1, xmax = 8, ymin = 4.3, ymax = 8), color = "black", alpha = 0)
Exclude neutrophils (CD15+) and B cells (CD19+, they exchange antigens with T cells)
data_norm_sub_trans %>%
# Retain all cells where the following rules apply
filter(
CD3 > 4.3,
CD45 > 1,
CD15 > 0,
CD19 > 0
) %>%
# Plot CD15 (y-axis) vs CD19 (x-axis)
ggplot(aes(x = CD19, y = CD15)) +
# Plot points with certain size and color alpha
geom_point(size = 0.01, alpha = 0.1) +
# Make 2d-density plots
geom_density_2d() +
# Plot a rectangle
# Alpha because geom_rect makes a filled rectangle otherwise
geom_rect(mapping = aes(
xmin = 0, xmax = 2.9, ymin = 0, ymax = 4
), color = "black", alpha = 0)
We first add a column to the data table where each cell gets the classification T cell = {TRUE, FALSE} according to your gating strategy. Then let’s see if the percentage of T cells make sense and how it looks per disease group (variable sev_merge).
data_norm_sub_trans <- data_norm_sub_trans %>%
# ifelse(logical_value, if TRUE, if FALSE)
# ifelse returns the value inside if TRUE when the logical
# value is TRUE, otherwise the other value.
# Draw proper gates in the transformed values to get the (CD45+CD3+CD15-CD19-) cells
mutate(Tcell = ifelse(CD3 > 4.3 & CD45 > 1 & CD15 < 4 & CD19 < 2.9, TRUE, FALSE))
data_norm_sub_trans <- data_norm_sub_trans %>%
# Factors are R-internal special vectors having "levels".
# All values must be a value of "levels". Internally, a factor is a numeric value pointing
# to one of the levels.
# ggplot is smart enough to handle factors properly (better than only vectors if needed)!
mutate(
sev_merge = factor(
sev_merge,
levels = c("healthy", "FLI", "HIV", "HBV", "mild/moderate", "severe/critical")
)
)
data_norm_sub_trans %>%
# count for each combination of
# id: Sample id
# Tcell: If it is a Tcell or not
# sev_merge: Which severity it is (sev_merge = mild+moderate and severe+critical merged)
# Disease.phase: acute or covalescent
# How many cells there are
# Count saves that in a new column "n"
count(id, Tcell, sev_merge, Disease.phase) %>%
# Result:
# id Tcell sev_merge Disease.phase n
# 19 FALSE severe/critical acute 1806
# 19 TRUE severe/critical acute 22
# 15 FALSE severe/critical acute 2407
# tidyverse functions can leverage "groups" of data, but you have to specify first
# what the group is defined on.
# Here we want to group the cells based on their (sample-) id
group_by(id) %>%
# We grouped the table by id (the sample)
# Now we define perc as "n" divided by the sum of all "n"s times 100
# This is applied _inside each group_
# After each group has only two rows (Tcell = TRUE or FALSE)
# This is effectively the percentage of T-cells in each sample
mutate(perc = n / sum(n) * 100) %>%
# We do not need the grouping anymore, therefore remove it
ungroup() %>%
# Select only Tcells (As "Tcell" is a True/False column)
filter(Tcell) %>%
# plot the just calculated percentage on the y-axis
# And the severity on the x-axis
# "fill" fills plots, usually used for coloring.
ggplot(aes(y = perc, x = sev_merge, fill = sev_merge)) +
# Make a boxplot per fill-value. That happens automatically.
# Alpha sets the color-density
geom_boxplot(position = position_dodge(1), alpha = 0.7) +
# Overlay dotplot to the barplots
geom_dotplot(
binaxis = "y", stackdir = "center",
position = position_dodge(1), alpha = 0.7
) +
# facet_grid splits the plots according to multiple values,
# you could even use multiple splitting-values
# with "free_x" we allow the x-axis to vary between plots, but the y-axis is fixed.
facet_grid(~Disease.phase, space = "free_x", scale = "free_x") +
# Previously we defined the coloring for the severity stages, here we apply it.
scale_fill_manual(values = color_severity) +
# Set y-axis label
ylab("Percentage of T cells (%)") +
# Set x-axis label to empty ("")
xlab("") +
# Remove the x-axis text and ticks.
theme(
axis.text.x = element_blank(),
axis.ticks.x = element_blank()
)
We are actually interested in finding T cell communities but in each main T cell compartment (CD4+, CD8+ and TCRgd+). For this, we manually gate T cells based on CD8 and TCRgd markers.
Gates were defined per CyTOF run. Due to time restrictions, we are providing the gates in asinh-scale. Eg.: If the cell has an asinh-signal of TCRgd lower or equal than the threshold, and a asinh-signal of CD8 lower or equal than the corresponding CD8 threshold, then the cell is CD4+.
We then create a new column “Tcellcompartment” and classify the cell according to the gates.
gates <- tibble(
# Each "run" has different gates we found manually.
Run = as.numeric(str_sort(unique(data_norm_sub_trans$Run))),
CD8threshold = c(3.9, 4.15, 3.7, 3.7, 4.3, 4.1, 3.8, 4.4, 5.1, 4.85, 4.55, 4.55, 4, 5, 4.2),
TCRgdthreshold = c(2.85, 2.85, 2.85, 2.85, 2.85, 2.85, 2.85, 2.85, 2.85, 2.85, 2.85, 2.85, 2.85, 2.85, 3.7)
)
data_Tcell <- data_norm_sub_trans %>%
# filter (keep) all rows where Tcell is TRUE
filter(Tcell)
data_Tcell <- data_Tcell %>%
# Show what happens here using left join
# Left join takes the "left" argument, here given implicitely with "%>%", so it is data_Tcell
# Then the dataframe "gates" is joined with data_Tcell using "Run" as "join"-value
left_join(gates) %>%
# According to the gates we assign a T cell compartment label to each cell
mutate(Tcellcompartment = case_when(
TCRgd <= TCRgdthreshold & CD8 <= CD8threshold ~ "CD4+",
TCRgd <= TCRgdthreshold & CD8 > CD8threshold ~ "CD8+",
TRUE ~ "TCRgd+"
))
data_Tcell %>%
# Plot CD8 (x-axis) and TCRgd (y-axis)
ggplot(aes(x = CD8, y = TCRgd)) +
# Use points, color them according to Tcellcompartment with certain size and alpha.
geom_point(aes(color = Tcellcompartment), size = 0.5, alpha = 0.5) +
# Override legend
guides(colour = guide_legend(override.aes = list(size = 5)))
data_Tcell %>%
# Count how many cells there are in every Tcellcompartment
count(Tcellcompartment) %>%
# Tcellcompartment n
# 1 CD4+ 66528
# 2 CD8+ 36605
# 3 TCRgd+ 2354
# We define perc as "n" divided by the sum of all "n"s times 100
# There is 3 "n"-values (because 3 rows), therefore this is effectively
# the percentage of cells in each Tcellcompartment
mutate(perc = n / sum(n) * 100) %>%
# Plot the percentage (y-axis) of each Tcellcompartment (x-axis) colored by Tcellcompartment
# and label it with the percentage, rounded to 2 decimal points
ggplot(aes(x = Tcellcompartment, y = perc, fill = Tcellcompartment, label = round(perc, 2))) +
# We use geom_col to plot the number of cells per compartment as column bars. Dodge means the bars are not stacked, but separated bars.
geom_col(position = "dodge") +
# We defined WHAT the labels should contain, now we actually plot them
geom_label() +
# Rotate the x-axis label, remove the legend
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1), legend.position = "none") +
# Set the y-axis label
ylab("Percentage of each T cell compartment from all T cells (%)") +
# Remove the x-axis label
xlab("")
https://pair-code.github.io/understanding-umap/
We now compute the UMAPs for each T cell compartment across all samples (acute and convalescent). We use all markers, except the markers we used for gating. We also exclude IgD, IgM, CD21, and CD14.
We define a vector “sel_markers_Tcells” removing all markers we do not want to include for the calculation of the UMAP.
# panel %in% c("text1", "text2", ...)
# Returns a vector which is TRUE if its corresponding value in "panel"
# is in the vector on the right and FALSE if it is not.
# ! panel %in% c("text1", "text2", ...)
# inverts this logical vector, so FALSE if value is contained in the right, TRUE if not.
# panel[!panel %in% c("CD45","CD3", "CD19","CD15","CD8","TCRgd","IgD","IgM","CD21","CD14")]
# selects the TRUE values in "panel" according to the just specified vector.
sel_markers_Tcells <- panel[!panel %in% c(
"CD45", "CD3", "CD19", "CD15", "CD8",
"TCRgd", "IgD", "IgM", "CD21", "CD14"
)]
Before calculating the UMAP, is important we transform our data (asinh) and apply z-score standardization (function scale). Why?
transformed_scaled_data <- data_Tcell %>%
# Then apply scale() on all columns defined by sel_markers_Tcells
mutate_at(vars(sel_markers_Tcells), scale)
UMAP_TCRgd <- transformed_scaled_data %>%
# Filter all cells which are TCRgd+
filter(Tcellcompartment == "TCRgd+") %>%
# Select only the sel_markers_Tcells columns
select(sel_markers_Tcells) %>%
# Calculate UMAP with defined hyperparameters
uwot::umap(
# The size of local neighborhood
n_neighbors = 30,
# The effective scale of embedded points. In combination with min_dist,
# this determines how clustered/clumped the embedded points are.
spread = 1,
# The effective minimum distance between embedded points
min_dist = 0.5,
# Type of distance metric to use to find nearest neighbors
metric = "euclidean",
# If TRUE, log details to the console.
verbose = TRUE,
# Setting fast_sgd to TRUE will speed up the stochastic optimization phase, but give a
# potentially less accurate embedding, and which will not be exactly reproducible
# even with a fixed seed.
fast_sgd = TRUE
)
UMAP_CD4 <- transformed_scaled_data %>%
# Filter all cells which are CD4+
filter(Tcellcompartment == "CD4+") %>%
# Select only the sel_markers_Tcells columns
select(sel_markers_Tcells) %>%
# Calculate UMAP with defined hyperparameters
uwot::umap(
n_neighbors = 30,
spread = 1,
min_dist = 0.5,
metric = "euclidean",
verbose = TRUE,
fast_sgd = TRUE
)
UMAP_CD8 <- transformed_scaled_data %>%
# Filter all cells which are CD8+
filter(Tcellcompartment == "CD8+") %>%
# Select only the sel_markers_Tcells columns
select(sel_markers_Tcells) %>%
# Calculate UMAP with defined hyperparameters
uwot::umap(
n_neighbors = 30,
spread = 1,
min_dist = 0.5,
metric = "euclidean",
verbose = TRUE,
fast_sgd = TRUE
)
#### Add UMAP information to the complete data frame
transformed_scaled_data$UMAP1 <- NA
transformed_scaled_data$UMAP2 <- NA
transformed_scaled_data$UMAP1[transformed_scaled_data$Tcellcompartment == "CD4+"] <- UMAP_CD4[, 1]
transformed_scaled_data$UMAP1[transformed_scaled_data$Tcellcompartment == "CD8+"] <- UMAP_CD8[, 1]
transformed_scaled_data$UMAP1[transformed_scaled_data$Tcellcompartment == "TCRgd+"] <- UMAP_TCRgd[, 1]
transformed_scaled_data$UMAP2[transformed_scaled_data$Tcellcompartment == "CD4+"] <- UMAP_CD4[, 2]
transformed_scaled_data$UMAP2[transformed_scaled_data$Tcellcompartment == "CD8+"] <- UMAP_CD8[, 2]
transformed_scaled_data$UMAP2[transformed_scaled_data$Tcellcompartment == "TCRgd+"] <- UMAP_TCRgd[, 2]
We now plot each UMAP, coloured by severity, disease phase, and intensity of a selected marker. Recommendation: Subsample CD4+ and CD8+ T cells for visualization, using sample_n(), or sample_frac.
Do you observe specific areas where CV19 samples accumulate? What does this mean?
p1 <- transformed_scaled_data %>%
# Select CD4+ Tcell compartment
filter(Tcellcompartment == "CD4+") %>%
# Sample 30000 rows (=cells)
sample_n(30000) %>%
# Plot the two just calculated UMAPs, color according to sev_merge(severity)
ggplot(aes(x = UMAP1, y = UMAP2, color = sev_merge)) +
# Plot points
geom_point(alpha = 0.5, size = 2) +
# Bigger and not transparent dots for the legend
guides(colour = guide_legend(ncol = 1, override.aes = list(size = 3, alpha = 1))) +
# Use our previously defined colors
scale_color_manual(values = color_severity, name = "") +
# Set a proper title!
# Always set proper titles. The idea is that you should be able to send a plot to you boss and
# he should understand what happens by just looking at the plot.
ggtitle("CD4+ T cells") +
# legend.position="none" removes the legend
# plot.title = element_text(hjust...) adjusts the horizontal adjustment (placing) of the title.
theme(legend.position = "none", plot.title = element_text(hjust = 0.5))
p2 <- transformed_scaled_data %>%
# Select CD8+ Tcell compartment
filter(Tcellcompartment == "CD8+") %>%
# Sample 30000 rows (=cells)
sample_n(30000) %>%
# Plot the two just calculated UMAPs, color according to sev_merge(severity)
ggplot(aes(x = UMAP1, y = UMAP2, color = sev_merge)) +
# Plot points
geom_point(alpha = 0.5, size = 2) +
# Bigger and not transparent dots for the legend
guides(colour = guide_legend(ncol = 1, override.aes = list(size = 3, alpha = 1))) +
# Use our previously defined colors
scale_color_manual(values = color_severity, name = "") +
# Set a proper title!
# Always set proper titles. The idea is that you should be able to send a plot to you boss and
# he should understand what happens by just looking at the plot.
ggtitle("CD8+ T cells") +
# legend.position="none" removes the legend
# plot.title = element_text(hjust...) adjusts the horizontal adjustment (placing) of the title.
theme(legend.position = "none", plot.title = element_text(hjust = 0.5))
p3 <- transformed_scaled_data %>%
# Select TCRgd+ Tcell compartment
filter(Tcellcompartment == "TCRgd+") %>%
# Plot the two just calculated UMAPs, color according to sev_merge(severity)
ggplot(aes(x = UMAP1, y = UMAP2, color = sev_merge)) +
# Plot points
geom_point(alpha = 0.5, size = 3) +
# Bigger and not transparent dots for the legend
guides(colour = guide_legend(ncol = 1, override.aes = list(size = 3, alpha = 1))) +
# Use our previously defined colors
scale_color_manual(values = color_severity, name = "") +
# Set a proper title!
# Always set proper titles. The idea is that you should be able to send a plot to you boss and
# he should understand what happens by just looking at the plot.
ggtitle("TCRgd+ T cells") +
# The last plot SHOULD have a legend.
# Note that technically this legend is only legit for p3 not for p2 or p1 because the color
# limits might be different
theme(
legend.position = "right", legend.direction = "vertical",
plot.title = element_text(hjust = 0.5)
)
# Until now we did CALCULATE the plots, but not actually plot them, that is the idea of ggplot.
# First BUILD the plots, only plot them at the very end.
# With plot_grid(..., nrow=1) we define that all sub-plots (p1, p2, p3) are plotted in one row.
plot_grid(p1, p2, p3, nrow = 1)
Do you observe specific areas where convalescent samples accumulate? What does this mean?
p1 <- transformed_scaled_data %>%
# Keep "CD4+" and only covid19 samples (Group == "CV19")
filter(Tcellcompartment == "CD4+", Group == "CV19") %>%
# Plot the two just calculated UMAPs, color according to Disease.phase
ggplot(aes(x = UMAP1, y = UMAP2, color = Disease.phase)) +
# Plot points
geom_point(alpha = 0.5, size = 2) +
# Bigger and not transparent dots for the legend
guides(colour = guide_legend(ncol = 1, override.aes = list(size = 3, alpha = 1))) +
# Define manual colors. We defined that the column "Disease.phase" should define the color
# Inside Disease.phase are the values "acute" and "convalescent"
# With scale_color_manual we can explicitly define which color they should get.
scale_color_manual(values = c("acute" = "red", "convalescent" = "black"), name = "") +
# Setting a proper title!
ggtitle("CD4+ T cells") +
# legend.position="none" removes the legend
# plot.title = element_text(hjust...) adjusts the horizontal adjustment (placing) of the title.
theme(legend.position = "none", plot.title = element_text(hjust = 0.5))
p2 <- transformed_scaled_data %>%
# Keep "CD8+" and only covid19 samples (Group == "CV19")
filter(Tcellcompartment == "CD8+", Group == "CV19") %>%
# Plot the two just calculated UMAPs, color according to Disease.phase
ggplot(aes(x = UMAP1, y = UMAP2, color = Disease.phase)) +
# Plot points
geom_point(alpha = 0.5, size = 2) +
# Bigger and not transparent dots for the legend
guides(colour = guide_legend(ncol = 1, override.aes = list(size = 3, alpha = 1))) +
# Define manual colors. We defined that the column "Disease.phase" should define the color
# Inside Disease.phase are the values "acute" and "convalescent"
# With scale_color_manual we can explicitly define which color they should get.
scale_color_manual(values = c("acute" = "red", "convalescent" = "black"), name = "") +
# Setting a proper title!
ggtitle("CD8+ T cells") +
# legend.position="none" removes the legend
# plot.title = element_text(hjust...) adjusts the horizontal adjustment (placing) of the title.
theme(legend.position = "none", plot.title = element_text(hjust = 0.5))
p3 <- transformed_scaled_data %>%
# Keep "TCRgd+" and only covid19 samples (Group == "CV19")
filter(Tcellcompartment == "TCRgd+", Group == "CV19") %>%
# Plot the two just calculated UMAPs, color according to Disease.phase
ggplot(aes(x = UMAP1, y = UMAP2, color = Disease.phase)) +
# Plot points
geom_point(alpha = 0.5, size = 3) +
# Bigger and not transparent dots for the legend
guides(colour = guide_legend(ncol = 1, override.aes = list(size = 3, alpha = 1))) +
# Define manual colors. We defined that the column "Disease.phase" should define the color
# Inside Disease.phase are the values "acute" and "convalescent"
# With scale_color_manual we can explicitly define which color they should get.
scale_color_manual(values = c("acute" = "red", "convalescent" = "black"), name = "") +
# Setting a proper title!
ggtitle("TCRgd+ T cells") +
# The last plot SHOULD have a legend.
# Note that technically this legend is only legit for p3 not for p2 or p1 because the color
# limits might be different
theme(plot.title = element_text(hjust = 0.5))
# Until now we did CALCULATE the plots, but not actually plot them, that is the idea of ggplot.
# First BUILD the plots, only plot them at the very end.
# With plot_grid(..., nrow=1) we define that all sub-plots (p1, p2, p3) are plotted in one row.
plot_grid(p1, p2, p3, nrow = 1)
p1 <- transformed_scaled_data %>%
# Keep "CD4+" cells
filter(Tcellcompartment == "CD4+") %>%
# Subsample 30000 cells
sample_n(30000) %>%
# Plot the two just calculated UMAPs, color according to HLADR values
ggplot(aes(x = UMAP1, y = UMAP2, color = HLADR)) +
# Make points
geom_point(alpha = 0.5, size = 2) +
# Setting proper title
ggtitle("CD4+ T cells") +
# No legend
theme(legend.position = "none", plot.title = element_text(hjust = 0.5)) +
# scale_color_gradient2 sets the color-scale for the aes "color"
# Where you can define which colors low, mid and high values should have
# Then it makes two color-gradients:
# low -> mid
# mid -> low
# Mixing the colors appropriately according to the value of the given column in aes()
scale_color_gradient2(low = "blue", mid = "grey", high = "red", midpoint = 0)
p2 <- transformed_scaled_data %>%
# Keep "CD8+" cells
filter(Tcellcompartment == "CD8+") %>%
# Subsample 30000 cells
sample_n(30000) %>%
# Plot the two just calculated UMAPs, color according to HLADR values
ggplot(aes(x = UMAP1, y = UMAP2, color = HLADR)) +
# Make points
geom_point(alpha = 0.5, size = 2) +
# Setting proper title
ggtitle("CD8+ T cells") +
# No legend
theme(legend.position = "none", plot.title = element_text(hjust = 0.5)) +
# scale_color_gradient2 sets the color-scale for the aes "color"
# Where you can define which colors low, mid and high values should have
# Then it makes two color-gradients:
# low -> mid
# mid -> low
# Mixing the colors appropriately according to the value of the given column in aes()
scale_color_gradient2(low = "blue", mid = "grey", high = "red", midpoint = 0)
p3 <- transformed_scaled_data %>%
# Keep "TCRgd+" cells
filter(Tcellcompartment == "TCRgd+") %>%
# Plot the two just calculated UMAPs, color according to HLADR values
ggplot(aes(x = UMAP1, y = UMAP2, color = HLADR)) +
geom_point(alpha = 0.5, size = 3) +
# Setting proper title
ggtitle("TCRgd+ T cells") +
# Last plot, having legend again (Mind that color scale can be different again!)
theme(plot.title = element_text(hjust = 0.5)) +
# scale_color_gradient2 sets the color-scale for the aes "color"
# Where you can define which colors low, mid and high values should have
# Then it makes two color-gradients:
# low -> mid
# mid -> low
# Mixing the colors appropriately according to the value of the given column in aes()
scale_color_gradient2(low = "blue", mid = "grey", high = "red", midpoint = 0)
# Until now we did CALCULATE the plots, but not actually plot them, that is the idea of ggplot.
# First BUILD the plots, only plot them at the very end.
# With plot_grid(..., nrow=1) we define that all sub-plots (p1, p2, p3) are plotted in one row.
plot_grid(p1, p2, p3, nrow = 1)
We now perform unsupervised clustering analysis on samples from control, FLI, HIV, HBV, and acute COVID-19 using the selected markers. As done for the UMAP calculation, before clustering is important we transform and z-score normalize our data. Let’s look at the distribution of the markers used for clustering:
transformed_scaled_data %>%
# Keep "CD8+" cells from "acute" disease phase
filter(Tcellcompartment == "CD8+", Disease.phase == "acute") %>%
# Keep only the columns defined by sel_markers_Tcells
select(sel_markers_Tcells) %>%
# Make the table from wide to long format
# The previous column names go into the new column "marker"
# The previous column's _values_ go into the new column "marker"
# And this should be applied to all (everything()) previous columns
pivot_longer(names_to = "marker", values_to = "value", everything()) %>%
# "value" on the x-axis
ggplot(aes(x = value)) +
# Make density plots
geom_density() +
# Split the plots according to the "marker" value, scale them freely
facet_wrap(~marker, scale = "free")
Due to time constraints, we will only cluster one T cell compartment, eg.: CD8+ T cells.
What’s the main difference between RPhenograph and FlowSOM?
IMPORTANT! In RPhenograph we define the number of nearest neighbours.
# Sys.time() returns the current time on your PC
start_time <- Sys.time()
clust_cd8_rpheno <- transformed_scaled_data %>%
# Keep "CD8+" cells from "acute" disease phase
filter(Tcellcompartment == "CD8+", Disease.phase == "acute") %>%
# Keep only the columns defined by sel_markers_Tcells
select(sel_markers_Tcells) %>%
# Start clustering with Rphenograph
# k: number of nearest neighbours (default:30)
Rphenograph(k = 30)
# Sys.time() returns the current time on your PC
end_time <- Sys.time()
# with difftime you can calculate the difference between two times received by Sys.time()
# You can also supply the unit you want to see (here "minutes")
#
# as.numeric() then makes a PC-usable number from that
# and round(..., 2) rounds to the second place after comma
time_elapsed <- round(as.numeric(difftime(end_time, start_time, units = "mins")), 2)
# paste() combines multiple strings together in R
# "\n" is a special string to make a new line when
# cat() is used to actually print to console
cat(paste("\n", time_elapsed, "minutes passed"))
##
## 1.84 minutes passed
After running the clustering algorithm, we add a column “Rpheno” in the data table with the cluster label for each cell:
cell_ids <- transformed_scaled_data %>%
# Keep "CD8+" cells from "acute" disease phase
filter(Tcellcompartment == "CD8+", Disease.phase == "acute") %>%
# Extract the column "cellid" from the dataframe as a usual vector
# This column uniquely identifies each cell
pull(cellid)
# Make a new tibble (table) with two columns:
# "cellid": The (unique!) cell ids
# "Rpheno": The cluster assignment for each of the cells
clust_cd8_rpheno_assignment <- tibble(
cellid = cell_ids,
Rpheno = as.character(igraph::membership(clust_cd8_rpheno))
)
# left_join here joins automatically by the column "cellid"
# So the column "cellid" is taken as identifier and the transformed_scaled_data is extended by the
# column "Rpheno", so the cluster assignment by Rpheno
transformed_scaled_data <- transformed_scaled_data %>% left_join(clust_cd8_rpheno_assignment)
# If you are worried that some now joined cluster-values are NA:
# Mind that transformed_scaled_data holds _all_ cells but we only clustered on the CD8+
# cells from acute samples!
transformed_scaled_data <- transformed_scaled_data %>%
# We will use Rpheno later in plots and want to have sorted cluster numbers.
# ggplot can only do that when the column is a factor
# If the column is a factor, the ordering of values is according to its levels
mutate(Rpheno = factor(Rpheno, levels = str_sort(unique(Rpheno), numeric = TRUE)))
IMPORTANT! In FlowSOM we define the number of clusters.
# Sys.time() returns the current time on your PC
start_time <- Sys.time()
clust_cd8_flowsom <- transformed_scaled_data %>%
# Keep "CD8+" cells from "acute" disease phase
filter(Tcellcompartment == "CD8+", Disease.phase == "acute") %>%
# Keep only the columns defined by sel_markers_Tcells
select(sel_markers_Tcells) %>%
# Start clustering with FlowSOM (from the cytofkit package)
cytofkit::cytof_cluster(
# %>% usually "pipes" the data into the FIRST argument of the called function
# but here we want to leave the first argument ("ydata") free, so we have to define
# it explicitely.
# To then "access" the piped data you can use the dot (.)
xdata = .,
# Specify the used method, here FlowSOM
method = "FlowSOM",
# Number of clusters for meta clustering in FlowSOM.
FlowSOM_k = 10
)
# Sys.time() returns the current time on your PC
end_time <- Sys.time()
# with difftime you can calculate the difference between two times received by Sys.time()
# You can also supply the unit you want to see (here "minutes")
#
# as.numeric() then makes a PC-usable number from that
# and round(..., 2) rounds to the second place after comma
time_elapsed <- round(as.numeric(difftime(end_time, start_time, units = "mins")), 2)
# paste() combines multiple strings together in R
# "\n" is a special string to make a new line when
# cat() is used to actually print to console
print(paste("\n", time_elapsed, "minutes passed"))
## [1] "\n 0.04 minutes passed"
After running the clustering algorithm, we add a column “flowsom” in the data table with the cluster label for each cell:
cell_ids <- transformed_scaled_data %>%
# Keep "CD8+" cells from "acute" disease phase
filter(Tcellcompartment == "CD8+", Disease.phase == "acute") %>%
# Extract the column "cellid" from the dataframe as a usual vector
# This column uniquely identifies each cell
pull(cellid)
# Make a new tibble (table) with two columns:
# "cellid": The (unique!) cell ids
# "flowsom": The cluster assignment for each of the cells
clust_cd8_flowsom_assignment <- tibble(
cellid = cell_ids,
flowsom = as.character(clust_cd8_flowsom)
)
# If you are worried that some now joined cluster-values are NA:
# Mind that transformed_scaled_data holds _all_ cells but we only clustered on the CD8+
# cells from acute samples!
# left_join here joins automatically by the column "cellid"
# So the column "cellid" is taken as identifier and the transformed_scaled_data is extended by the
# column "flowsom", so the cluster assignment by flowsom
transformed_scaled_data <- transformed_scaled_data %>% left_join(clust_cd8_flowsom_assignment)
transformed_scaled_data <- transformed_scaled_data %>%
# We will use flowsom later in plots and want to have sorted cluster numbers.
# ggplot can only do that when the column is a factor
# If the column is a factor, the ordering of values is according to its levels
mutate(flowsom = factor(flowsom, levels = str_sort(unique(flowsom), numeric = TRUE)))
We now visualize the number of cells in each cluster:
transformed_scaled_data %>%
# Keep "CD8+" cells from "acute" disease phase
filter(Tcellcompartment == "CD8+", Disease.phase == "acute") %>%
# Count the number of rows (cells) per value in the Rpheno (cluster-id) column
count(Rpheno) %>%
# Plot
# - Rpheno column (cluster-id) on the x-axis
# - Number of cells per Rpheno on y-axis
# - Use the number of cells as label
ggplot(aes(x = Rpheno, y = n, label = n)) +
# Make a bar chart
geom_col(position = "dodge") +
# Add labels on each bar
geom_label()
transformed_scaled_data %>%
# Keep "CD8+" cells from "acute" disease phase
filter(Tcellcompartment == "CD8+", Disease.phase == "acute") %>%
# Count the number of rows (cells) per value in the flowsom (cluster-id) column
count(flowsom) %>%
# Plot
# - flowsom column (cluster-id) on the x-axis
# - Number of cells per flowsom on y-axis
# - Use the number of cells as label
ggplot(aes(x = flowsom, y = n, label = n)) +
# Make a bar chart
geom_col(position = "dodge") +
# Add labels on each bar
geom_label()
And then plot the UMAP, this time coloured by cluster:
transformed_scaled_data %>%
# Keep "CD8+" cells from "acute" disease phase
filter(Tcellcompartment == "CD8+", Disease.phase == "acute") %>%
# Plot the two just calculated UMAPs, color according to Rphenograph clusters
ggplot(aes(x = UMAP1, y = UMAP2, color = Rpheno)) +
# Plot points with transparency and size defined
geom_point(alpha = 0.5, size = 1) +
# Bigger and not transparent dots for the legend
guides(colour = guide_legend(ncol = 1, override.aes = list(size = 3, alpha = 1))) +
# Setting proper title
ggtitle("CD8+ T cells") +
# Center the plot title
theme(plot.title = element_text(hjust = 0.5))
transformed_scaled_data %>%
# Keep "CD8+" cells from "acute" disease phase
filter(Tcellcompartment == "CD8+", Disease.phase == "acute") %>%
# Plot the two just calculated UMAPs, color according to flowsom clusters
ggplot(aes(x = UMAP1, y = UMAP2, color = flowsom)) +
# Plot points with transparency and size defined
geom_point(alpha = 0.5, size = 1) +
# Bigger and not transparent dots for the legend
guides(colour = guide_legend(ncol = 1, override.aes = list(size = 3, alpha = 1))) +
# Setting proper title
ggtitle("CD8+ T cells") +
# Center the plot title
theme(plot.title = element_text(hjust = 0.5))
Let’s look at the marker distribution per cluster:
transformed_scaled_data %>%
# Keep "CD8+" cells from "acute" disease phase
filter(Tcellcompartment == "CD8+", Disease.phase == "acute") %>%
# Keep only the columns defined by sel_markers_Tcells _and_ Rpheno
select(sel_markers_Tcells, Rpheno) %>%
# Make the table from wide to long format
# The previous column names go into the new column "marker"
# The previous column's _values_ go into the new column "marker"
# And this should be applied to the columns defined by sel_markers_Tcells
pivot_longer(names_to = "marker", values_to = "value", sel_markers_Tcells) %>%
# x-axis: Marker values
# y-axis: Cluster ids
# fill-color: Cluster ids
ggplot(aes(x = value, y = Rpheno, fill = Rpheno)) +
# geom_density_ridges plots a density plot for each value defined by y
ggridges::geom_density_ridges() +
# Make the same kind of multi-density-plot for each marker!
facet_wrap(~marker, scale = "free")
transformed_scaled_data %>%
# Keep "CD8+" cells from "acute" disease phase
filter(Tcellcompartment == "CD8+", Disease.phase == "acute") %>%
# Keep only the columns defined by sel_markers_Tcells _and_ flowsom
select(sel_markers_Tcells, flowsom) %>%
# Make the table from wide to long format
# The previous column names go into the new column "marker"
# The previous column's _values_ go into the new column "marker"
# And this should be applied to the columns defined by sel_markers_Tcells
pivot_longer(names_to = "marker", values_to = "value", sel_markers_Tcells) %>%
# x-axis: Marker values
# y-axis: Cluster ids
# fill-color: Cluster ids
ggplot(aes(x = value, y = flowsom, fill = flowsom)) +
# geom_density_ridges plots a density plot for each value defined by y
geom_density_ridges() +
# Make the same kind of multi-density-plot for each marker!
facet_wrap(~marker, scale = "free")
We now want to actually understand what are these clusters we found. For this, we can visualize the average expression of each marker in each cluster with a heatmap.
transformed_scaled_data %>%
# Keep "CD8+" cells from "acute" disease phase
filter(Tcellcompartment == "CD8+", Disease.phase == "acute") %>%
# Keep only the columns defined by sel_markers_Tcells _and_ Rpheno
select(sel_markers_Tcells, Rpheno) %>%
# group by Rpheno-clusters
group_by(Rpheno) %>%
# Summarise_at "summarises" all rows (per group!) using the functions defined in list()
# "~mean(...)" is a so-called lambda function (anonymous)
# In essence: Calculate the mean _per marker_ inside each cluster (because group_by)
summarise_at(vars(sel_markers_Tcells), list(~ mean(., na.rm = TRUE))) %>%
# Make the table from wide to long format
# The previous column names go into the new column "marker"
# The previous column's _values_ go into the new column "marker"
# And this should be applied to all columns EXCEPT Rpheno (the "-" tells that)
pivot_longer(names_to = "markers", values_to = "avg_zscore", -Rpheno) %>%
# Modify the column "markers" into being a factor with a specified order of levels
# (the reverse order from sel_markers_Tcells)
mutate(markers = factor(markers, levels = rev(sel_markers_Tcells))) %>%
# Start plot:
# - x-axis: Cluster
# - y-axis: The marker
# - fill color: The just calculated average score
ggplot(aes(x = Rpheno, y = markers, fill = avg_zscore)) +
# Make a heatmap
geom_tile() +
# Replace the default color scale
scale_fill_gradient2(low = "blue", mid = "white", high = "red", limits = c(-3.5, 3.5))
transformed_scaled_data %>%
# Keep "CD8+" cells from "acute" disease phase
filter(Tcellcompartment == "CD8+" & Disease.phase == "acute") %>%
# Keep only the columns defined by sel_markers_Tcells _and_ flowsom
select(sel_markers_Tcells, flowsom) %>%
# group by flowsom-clusters
group_by(flowsom) %>%
# Summarise_at "summarises" all rows (per group!) using the functions defined in list()
# "~mean(...)" is a so-called lambda function (anonymous)
# In essence: Calculate the mean _per marker_ inside each cluster (because group_by)
summarise_at(vars(sel_markers_Tcells), funs(mean(., na.rm = TRUE))) %>%
# Make the table from wide to long format
# The previous column names go into the new column "marker"
# The previous column's _values_ go into the new column "marker"
# And this should be applied to all columns EXCEPT flowsom (the "-" tells that)
pivot_longer(names_to = "markers", values_to = "avg_zscore", -flowsom) %>%
# Modify the column "markers" into being a factor with a specified order of levels
# (the reverse order from sel_markers_Tcells)
mutate(markers = factor(markers, levels = rev(sel_markers_Tcells))) %>%
# Start plot:
# - x-axis: Cluster
# - y-axis: The marker
# - fill color: The just calculated average score
ggplot(aes(x = flowsom, y = markers, fill = avg_zscore)) +
# Make a heatmap
geom_tile() +
# Replace the default color scale
scale_fill_gradient2(low = "blue", mid = "white", high = "red", limits = c(-3.5, 3.5))
With the library “pheatmap” we can additionally group the clusters (dendogram) and group the markers by category (annotation).
data_heatmap <- transformed_scaled_data %>%
# Keep "CD8+" cells from "acute" disease phase
filter(Tcellcompartment == "CD8+" & Disease.phase == "acute") %>%
# Keep only the columns defined by sel_markers_Tcells _and_ Rpheno
select(sel_markers_Tcells, Rpheno) %>%
# group by Rpheno-clusters
group_by(Rpheno) %>%
# Summarise_at "summarises" all rows (per group!) using the functions defined in list()
# "~mean(...)" is a so-called lambda function (anonymous)
# In essence: Calculate the mean _per marker_ inside each cluster (because group_by)
summarise_at(vars(sel_markers_Tcells), list(~mean(., na.rm = TRUE))) %>%
# tibbles usually do not have row-names, but some (usually classical) functions
# make use of the rownames
# Therefore, let's make the Rpheno-column the rownames!
column_to_rownames("Rpheno")
# Create a data frame for the marker categories
annotation_rows <- data.frame(
marker_category = c(
rep("Differentiation", 2),
rep("Co-stimulation", 4),
rep("Co-inhibition", 4),
rep("Activation", 7),
rep("Chemokine receptor", 4),
rep("Others", 9)
)
)
# The sel_markers_Tcells should be the rownames of the annotations
rownames(annotation_rows) <- sel_markers_Tcells
data_heatmap %>%
# t() transposes data_heatmap
# apart from that it automatically converts the tibble into a matrix
t() %>%
pheatmap(
# Do not cluster the rows when plotting
cluster_rows = FALSE,
# Do cluster the columns when plotting
cluster_cols = TRUE,
# Add the annotation per row
annotation_row = annotation_rows,
# Do not show the annotation_row column name
annotation_names_row = FALSE,
# vector of row indices that show where to put gaps into heatmap. Used only if the rows are not clustered. See cutree_row to see how to introduce gaps to clustered rows.
gaps_row = c(2, 6, 10, 17, 21),
# a sequence of numbers that covers the range of values in mat and
# is one element longer than color vector.
# Used for mapping values to colors.
breaks = seq(-3.5, 3.5, length.out = 100),
# For every value in "breaks", generate a corresponding color value with colorRampPalette
# This then goes
# blue (minimum) -> white (midpoint) -> red (maximum)
color = colorRampPalette(c("blue", "white", "red"))(100)
)
data_heatmap <- transformed_scaled_data %>%
# Keep "CD8+" cells from "acute" disease phase
filter(Tcellcompartment == "CD8+" & Disease.phase == "acute") %>%
# Keep only the columns defined by sel_markers_Tcells _and_ flowsom
select(sel_markers_Tcells, flowsom) %>%
# group by flowsom-clusters
group_by(flowsom) %>%
# Summarise_at "summarises" all rows (per group!) using the functions defined in list()
# "~mean(...)" is a so-called lambda function (anonymous)
# In essence: Calculate the mean _per marker_ inside each cluster (because group_by)
summarise_at(vars(sel_markers_Tcells), list(~mean(., na.rm = TRUE))) %>%
# tibbles usually do not have row-names, but some (usually classical) functions
# make use of the rownames
# Therefore, let's make the flowsom-column the rownames!
column_to_rownames("flowsom")
# Create a data frame for the marker categories
annotation_rows <- data.frame(marker_category = c(
rep("Differentiation", 2),
rep("Co-stimulation", 4),
rep("Co-inhibition", 4),
rep("Activation", 7),
rep("Chemokine receptor", 4),
rep("Others", 9)
))
# The sel_markers_Tcells should be the rownames of the annotations
rownames(annotation_rows) <- sel_markers_Tcells
data_heatmap %>%
# t() transposes data_heatmap
# apart from that it automatically converts the tibble into a matrix
t() %>%
pheatmap(
# Do not cluster the rows when plotting
cluster_rows = FALSE,
# Do cluster the columns when plotting
cluster_cols = TRUE,
# Add the annotation per row
annotation_row = annotation_rows,
# Do not show the annotation_row column name
annotation_names_row = FALSE,
# vector of row indices that show where to put gaps into heatmap. Used only if the rows are not clustered. See cutree_row to see how to introduce gaps to clustered rows.
gaps_row = c(2, 6, 10, 17, 21),
# a sequence of numbers that covers the range of values in mat and
# is one element longer than color vector.
# Used for mapping values to colors.
breaks = seq(-3.5, 3.5, length.out = 100),
# For every value in "breaks", generate a corresponding color value with colorRampPalette
# This then goes
# blue (minimum) -> white (midpoint) -> red (maximum)
color = colorRampPalette(c("blue", "white", "red"))(100)
)
How much percentage of cells from RPhenograph cluster X where classified in FlowSOM cluster Y?
data_heatmap <- transformed_scaled_data %>%
# Keep "CD8+" cells from "acute" disease phase
filter(Tcellcompartment == "CD8+" & Disease.phase == "acute") %>%
# count for each combination of
# Rpheno: Cluster from Rphenograph
# flowsom: Cluster from flowsom
# How many cells there are
# Count saves that in a new column "n"
count(Rpheno, flowsom) %>%
# When using the function count(), if a cluster is absent in a donor, it will not be counted as zero.
# So we complete the count table by filling the missing clusters with a 0.
tidyr::complete(Rpheno, flowsom, fill = list(n = 0)) %>%
# group by Rpheno-clusters
group_by(Rpheno) %>%
# due to the grouping, sum(n) gets the number of cells in each Rpheno-cluster
mutate(total_Rpheno = sum(n)) %>%
# Remove the grouping
ungroup() %>%
# Get the percentage of cells (per row) in total_Rpheno
# After total_Rpheno was calculated per Rpheno-cluster this results in the percentage
# of cells of each flowSOM cluster being in the respective Rpheno-cluster
mutate(perc = n / total_Rpheno * 100) %>%
# # A tibble: 100 x 5
# Rpheno flowsom n total_Rpheno perc
# <fct> <fct> <int> <int> <dbl>
# 1 1 1 1 2958 0.0338
# 2 1 2 112 2958 3.79
# 3 1 3 2648 2958 89.5
# Select only the columns Rpheno, flowsom, perc
select(Rpheno, flowsom, perc) %>%
# Rpheno flowsom perc
# <fct> <fct> <dbl>
# 1 1 1 0.0338
# 2 1 2 3.79
# 3 1 3 89.5
# Now generate a matrix-like visualization by moving the Rpheno-column into separate columns
# based on the Rpheno-value
pivot_wider(names_from = Rpheno, values_from = "perc") %>%
# Move the flowsom column to the rownames.
column_to_rownames("flowsom")
pheatmap(
data_heatmap,
# Do cluster the rows when plotting
cluster_rows = TRUE,
# Do cluster the columns when plotting
cluster_cols = TRUE,
# Create custom labels for the rows
labels_row = paste0(rownames(data_heatmap), "_flowsom"),
# Create custom labels for the columns
labels_col = paste0(rownames(data_heatmap), "_rpheno")
)
Finally, we can calculate the abundance of each cluster in each sample and check if there are COVID-specific or severity-specific groups of cells.
ACHTUNG! In this data set, donors where sampled multiple times during the disease course. We establish an arbitrary rule of choosing the first sample per donor (usually during the first week post-symptom onset) if multiple samples are available.
selected_ids <- transformed_scaled_data %>%
# Get only cells from the acute disease phase
filter(Disease.phase == "acute") %>%
# count for each combination of
# Individuals: Patient id (there are potentially multiple samples per patient)
# id: Sample id
# Group: "HC" "CV19" "HBV" "HIV" "FLI"
# sev_merge: Which severity it is (sev_merge = mild+moderate and severe+critical merged)
# Days.post.symptom.onset: Time of measurement after symptom onset
# How many cells there are
# Count saves that in a new column "n"
count(Individuals, id, Group, sev_merge, Days.post.symptom.onset) %>%
# Remove the "n" column
select(-n) %>%
# Group by Individual
group_by(Individuals) %>%
# Inside each Individual, create a new column "n_samples"
# which always starts from 1 and goes up to the number of samples of that individual (which is the number of rows in that group)
mutate(n_samples = 1:n()) %>%
# Select the id where the Days.post.symptom.onset is the minimum
mutate(select_id = ifelse(n_samples == 1, TRUE,
ifelse(min(as.numeric(Days.post.symptom.onset)) == as.numeric(Days.post.symptom.onset),
TRUE,
FALSE
)
)) %>%
# Get only the rows where select_id column is TRUE
filter(select_id) %>%
# Get the ids
pull(id)
We can visualize the abundance of each cluster per donor as boxplots per severity group.
transformed_scaled_data %>%
# Keep all samples (id) which are in the selected_ids AND in the Tcellcompartment CD8+
filter(id %in% selected_ids, Tcellcompartment == "CD8+") %>%
# count for each combination of
# id: Sample id
# sev_merge: Which severity it is (sev_merge = mild+moderate and severe+critical merged)
# Tcellcompartment: We selected only CD8+ but
# Rpheno: Cluster from Rphenograph
# How many cells there are
# Count saves that in a new column "n"
count(id, sev_merge, Tcellcompartment, Rpheno) %>%
# Group by sample ID and Tcellcompartment
group_by(id, Tcellcompartment) %>%
# Get the percentage of rows per group (so... per sample)
mutate(perc = n / sum(n) * 100) %>%
# Remove grouping
ungroup() %>%
# When using the function count(), if a cluster is absent in a donor, it will not be counted as zero.
# So we complete the count table by filling the missing clusters with a 0.
tidyr::complete(id, Rpheno, fill = list(n = 0, perc = 0)) %>%
# tidyr::complete did not fill Tcellcompartment, therefore fill it manually
# 1. Remove all NA (missing values) from Tcellcompartment: Tcellcompartment[!is.na(Tcellcompartment)]
# 2. Get the unique values (should be only 1!)
# 3. Set all Tcellcompartment values to this unique value
mutate(Tcellcompartment = unique(Tcellcompartment[!is.na(Tcellcompartment)])) %>%
# Group per sample
group_by(id) %>%
# after we grouped per sample, each sample has its own sev_merge value for all clusters!
mutate(sev_merge = unique(sev_merge[!is.na(sev_merge)])) %>%
# Remove grouping
ungroup() %>%
# Plot:
# x-axis: sev_merge
# y-axis: percentage of cells in the cluster
# fill color: sev_merge
ggplot(aes(x = sev_merge, y = perc, fill = sev_merge)) +
# Make boxplots
geom_boxplot() +
# Add the values as points
geom_jitter() +
# Split the previous plot into each separate cluster
facet_wrap(~Rpheno, scale = "free") +
# Previously we defined the coloring for the severity stages, here we apply it.
scale_fill_manual(values = color_severity) +
# Remove the x-axis text
theme(axis.text.x = element_blank()) +
# Set a title
ggtitle("RPhenograph cluster abundance")
transformed_scaled_data %>%
# Keep all samples (id) which are in the selected_ids AND in the Tcellcompartment CD8+
filter(id %in% selected_ids, Tcellcompartment == "CD8+") %>%
# count for each combination of
# id: Sample id
# sev_merge: Which severity it is (sev_merge = mild+moderate and severe+critical merged)
# Tcellcompartment: We selected only CD8+ but
# flowsom: Cluster from flowsom
# How many cells there are
# Count saves that in a new column "n"
count(id, sev_merge, Tcellcompartment, flowsom) %>%
# Group by sample ID and Tcellcompartment
group_by(id, Tcellcompartment) %>%
# Get the percentage of rows per group (so... per sample)
mutate(perc = n / sum(n) * 100) %>%
# Remove grouping
ungroup() %>%
# When using the function count(), if a cluster is absent in a donor, it will not be counted as zero.
# So we complete the count table by filling the missing clusters with a 0.
tidyr::complete(id, flowsom, fill = list(n = 0, perc = 0)) %>%
# tidyr::complete did not fill Tcellcompartment, therefore fill it manually
# 1. Remove all NA (missing values) from Tcellcompartment: Tcellcompartment[!is.na(Tcellcompartment)]
# 2. Get the unique values (should be only 1!)
# 3. Set all Tcellcompartment values to this unique value
mutate(Tcellcompartment = unique(Tcellcompartment[!is.na(Tcellcompartment)])) %>%
# Group per sample
group_by(id) %>%
# after we grouped per sample, each sample has its own sev_merge value for all clusters!
fill(sev_merge, .direction = "downup") %>%
# Remove grouping
ungroup() %>%
# Plot:
# x-axis: sev_merge
# y-axis: percentage of cells in the cluster
# fill color: sev_merge
ggplot(aes(x = sev_merge, y = perc, fill = sev_merge)) +
# Make boxplots
geom_boxplot() +
# Add the values as points
geom_jitter() +
# Split the previous plot into each separate cluster
facet_wrap(~flowsom, scale = "free") +
# Previously we defined the coloring for the severity stages, here we apply it.
scale_fill_manual(values = color_severity) +
# Remove the x-axis text
theme(axis.text.x = element_blank()) +
# Set a title
ggtitle("FlowSom cluster abundance")
Make a presentation/talk (up to 20 minutes long, 10 minutes discussion) about the analysis pipeline that you familiarized yourself with. Mention what cell type you looked at, and present the results with your interpretation given the context of the experimental design.
Doesn’t have to be a proper PowerPoint, you can just open the R Markdown and go through it.
Following questions should be touched upon:
Pre-gating:
UMAP:
Unsupervised Clustering:
What are the differences between algorithms?
Which one gave better results?
What are the situations where you would prefer using the other one?
Cluster annotation
Interpretation of results in the context of acute COVID-19 using heatmaps and cluster abundances.
Future plans:
print("Done")
## [1] "Done"
print(session_info())
## - Session info ---------------------------------------------------------------
## setting value
## version R version 4.1.3 (2022-03-10)
## os Windows 10 x64 (build 22000)
## system x86_64, mingw32
## ui RTerm
## language (EN)
## collate German_Germany.1252
## ctype German_Germany.1252
## tz Europe/Berlin
## date 2022-04-23
## pandoc 2.14.0.3 @ D:/Programme/RStudio/bin/pandoc/ (via rmarkdown)
##
## - Packages -------------------------------------------------------------------
## ! package * version date (UTC) lib source
## abind 1.4-5 2016-07-21 [1] CRAN (R 4.1.1)
## assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.1.3)
## aws.s3 0.3.21 2020-04-07 [1] CRAN (R 4.1.3)
## aws.signature 0.6.0 2020-06-01 [1] CRAN (R 4.1.3)
## backports 1.4.1 2021-12-13 [1] CRAN (R 4.1.2)
## base64enc 0.1-3 2015-07-28 [1] CRAN (R 4.1.1)
## Biobase 2.54.0 2021-10-26 [1] Bioconductor
## BiocGenerics 0.40.0 2021-10-26 [1] Bioconductor
## BiocManager * 1.30.16 2021-06-15 [1] CRAN (R 4.1.3)
## bitops 1.0-7 2021-04-24 [1] CRAN (R 4.1.1)
## boot 1.3-28 2021-05-03 [1] CRAN (R 4.1.3)
## brio 1.1.3 2021-11-30 [1] CRAN (R 4.1.3)
## broom 0.8.0 2022-04-13 [1] CRAN (R 4.1.3)
## bslib 0.3.1 2021-10-06 [1] CRAN (R 4.1.3)
## cachem 1.0.6 2021-08-19 [1] CRAN (R 4.1.3)
## callr 3.7.0 2021-04-20 [1] CRAN (R 4.1.3)
## car 3.0-12 2021-11-06 [1] CRAN (R 4.1.3)
## carData 3.0-5 2022-01-06 [1] CRAN (R 4.1.3)
## caTools 1.18.2 2021-03-28 [1] CRAN (R 4.1.3)
## cellranger 1.1.0 2016-07-27 [1] CRAN (R 4.1.3)
## circlize 0.4.14 2022-02-11 [1] CRAN (R 4.1.3)
## class 7.3-20 2022-01-16 [1] CRAN (R 4.1.3)
## cli 3.2.0 2022-02-14 [1] CRAN (R 4.1.3)
## clue 0.3-60 2021-10-11 [1] CRAN (R 4.1.3)
## cluster 2.1.2 2021-04-17 [1] CRAN (R 4.1.3)
## codetools 0.2-18 2020-11-04 [1] CRAN (R 4.1.3)
## colorRamps 2.3 2012-10-29 [1] CRAN (R 4.1.1)
## colorspace 2.0-3 2022-02-21 [1] CRAN (R 4.1.3)
## colourpicker 1.1.1 2021-10-04 [1] CRAN (R 4.1.3)
## ComplexHeatmap * 2.10.0 2021-10-26 [1] Bioconductor
## ConsensusClusterPlus 1.58.0 2021-10-26 [1] Bioconductor
## cowplot * 1.1.1 2020-12-30 [1] CRAN (R 4.1.3)
## crayon 1.5.1 2022-03-26 [1] CRAN (R 4.1.3)
## curl 4.3.2 2021-06-23 [1] CRAN (R 4.1.3)
## cytofkit * 0.99.0 2022-04-23 [1] Github (JinmiaoChenLab/cytofkit@3d13408)
## cytolib 2.6.2 2022-02-08 [1] Bioconductor
## CytoML 2.6.0 2021-10-26 [1] Bioconductor
## data.table * 1.14.2 2021-09-27 [1] CRAN (R 4.1.3)
## DBI 1.1.2 2021-12-20 [1] CRAN (R 4.1.3)
## dbplyr 2.1.1 2021-04-06 [1] CRAN (R 4.1.3)
## DelayedArray 0.20.0 2021-10-26 [1] Bioconductor
## DEoptimR 1.0-11 2022-04-03 [1] CRAN (R 4.1.3)
## desc 1.4.1 2022-03-06 [1] CRAN (R 4.1.3)
## destiny 3.8.1 2022-01-30 [1] Bioconductor
## devtools * 2.4.3 2021-11-30 [1] CRAN (R 4.1.3)
## digest 0.6.29 2021-12-01 [1] CRAN (R 4.1.3)
## doParallel 1.0.17 2022-02-07 [1] CRAN (R 4.1.3)
## dplyr * 1.0.8 2022-02-08 [1] CRAN (R 4.1.3)
## e1071 1.7-9 2021-09-16 [1] CRAN (R 4.1.3)
## ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.1.3)
## evaluate 0.15 2022-02-18 [1] CRAN (R 4.1.3)
## fansi 1.0.3 2022-03-24 [1] CRAN (R 4.1.3)
## farver 2.1.0 2021-02-28 [1] CRAN (R 4.1.3)
## fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.1.3)
## flowCore 2.6.0 2021-10-26 [1] Bioconductor
## FlowSOM 2.2.0 2021-10-26 [1] Bioconductor
## flowWorkspace 4.6.0 2021-10-26 [1] Bioconductor
## FNN 1.1.3 2019-02-15 [1] CRAN (R 4.1.3)
## forcats * 0.5.1 2021-01-27 [1] CRAN (R 4.1.3)
## foreach 1.5.2 2022-02-02 [1] CRAN (R 4.1.3)
## fs 1.5.2 2021-12-08 [1] CRAN (R 4.1.3)
## generics 0.1.2 2022-01-31 [1] CRAN (R 4.1.3)
## GenomeInfoDb 1.30.1 2022-01-30 [1] Bioconductor
## GenomeInfoDbData 1.2.7 2022-04-23 [1] Bioconductor
## GenomicRanges 1.46.1 2021-11-18 [1] Bioconductor
## GetoptLong 1.0.5 2020-12-15 [1] CRAN (R 4.1.3)
## ggcyto 1.22.0 2021-10-26 [1] Bioconductor
## ggforce 0.3.3 2021-03-05 [1] CRAN (R 4.1.3)
## ggnewscale 0.4.7 2022-03-25 [1] CRAN (R 4.1.3)
## ggplot.multistats 1.0.0 2019-10-28 [1] CRAN (R 4.1.3)
## ggplot2 * 3.3.5 2021-06-25 [1] CRAN (R 4.1.3)
## ggpointdensity 0.1.0 2019-08-28 [1] CRAN (R 4.1.3)
## ggpubr 0.4.0 2020-06-27 [1] CRAN (R 4.1.3)
## ggrepel 0.9.1 2021-01-15 [1] CRAN (R 4.1.3)
## ggridges * 0.5.3 2021-01-08 [1] CRAN (R 4.1.3)
## ggsignif 0.6.3 2021-09-09 [1] CRAN (R 4.1.3)
## ggthemes 4.2.4 2021-01-20 [1] CRAN (R 4.1.3)
## GlobalOptions 0.1.2 2020-06-10 [1] CRAN (R 4.1.3)
## glue 1.6.2 2022-02-24 [1] CRAN (R 4.1.3)
## gplots 3.1.1 2020-11-28 [1] CRAN (R 4.1.3)
## graph 1.72.0 2021-10-26 [1] Bioconductor
## gridExtra 2.3 2017-09-09 [1] CRAN (R 4.1.3)
## gtable 0.3.0 2019-03-25 [1] CRAN (R 4.1.3)
## gtools 3.9.2 2021-06-06 [1] CRAN (R 4.1.3)
## haven 2.4.3 2021-08-04 [1] CRAN (R 4.1.3)
## hexbin 1.28.2 2021-01-08 [1] CRAN (R 4.1.3)
## highr 0.9 2021-04-16 [1] CRAN (R 4.1.3)
## hms 1.1.1 2021-09-26 [1] CRAN (R 4.1.3)
## htmltools 0.5.2 2021-08-25 [1] CRAN (R 4.1.3)
## htmlwidgets 1.5.4 2021-09-08 [1] CRAN (R 4.1.3)
## httpuv 1.6.5 2022-01-05 [1] CRAN (R 4.1.3)
## httr 1.4.2 2020-07-20 [1] CRAN (R 4.1.3)
## igraph * 1.2.11 2022-01-04 [1] CRAN (R 4.1.3)
## IRanges 2.28.0 2021-10-26 [1] Bioconductor
## irlba 2.3.5 2021-12-06 [1] CRAN (R 4.1.3)
## isoband 0.2.5 2021-07-13 [1] CRAN (R 4.1.3)
## iterators 1.0.14 2022-02-05 [1] CRAN (R 4.1.3)
## jpeg 0.1-9 2021-07-24 [1] CRAN (R 4.1.1)
## jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.1.3)
## jsonlite 1.8.0 2022-02-22 [1] CRAN (R 4.1.3)
## KernSmooth 2.23-20 2021-05-03 [1] CRAN (R 4.1.3)
## knitr * 1.38 2022-03-25 [1] CRAN (R 4.1.3)
## labeling 0.4.2 2020-10-20 [1] CRAN (R 4.1.1)
## laeken 0.5.2 2021-10-06 [1] CRAN (R 4.1.3)
## later 1.3.0 2021-08-18 [1] CRAN (R 4.1.3)
## lattice 0.20-45 2021-09-22 [1] CRAN (R 4.1.3)
## latticeExtra 0.6-29 2019-12-19 [1] CRAN (R 4.1.3)
## lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.1.3)
## lmtest 0.9-40 2022-03-21 [1] CRAN (R 4.1.3)
## lubridate 1.8.0 2021-10-07 [1] CRAN (R 4.1.3)
## magrittr 2.0.2 2022-01-26 [1] CRAN (R 4.1.3)
## MASS 7.3-55 2022-01-16 [1] CRAN (R 4.1.3)
## Matrix * 1.4-0 2021-12-08 [1] CRAN (R 4.1.3)
## MatrixGenerics 1.6.0 2021-10-26 [1] Bioconductor
## matrixStats 0.61.0 2021-09-17 [1] CRAN (R 4.1.3)
## memoise 2.0.1 2021-11-26 [1] CRAN (R 4.1.3)
## mgcv 1.8-39 2022-02-24 [1] CRAN (R 4.1.3)
## mime 0.12 2021-09-28 [1] CRAN (R 4.1.1)
## miniUI 0.1.1.1 2018-05-18 [1] CRAN (R 4.1.3)
## modelr 0.1.8 2020-05-19 [1] CRAN (R 4.1.3)
## munsell 0.5.0 2018-06-12 [1] CRAN (R 4.1.3)
## ncdfFlow 2.40.0 2021-10-26 [1] Bioconductor
## needs * 0.0.3 2016-03-28 [1] CRAN (R 4.1.1)
## nlme 3.1-155 2022-01-16 [1] CRAN (R 4.1.3)
## nnet 7.3-17 2022-01-16 [1] CRAN (R 4.1.3)
## pacman * 0.5.1 2019-03-11 [1] CRAN (R 4.1.3)
## pcaMethods 1.86.0 2021-10-26 [1] Bioconductor
## pdist 1.2 2013-02-03 [1] CRAN (R 4.1.1)
## permute 0.9-7 2022-01-27 [1] CRAN (R 4.1.3)
## pheatmap * 1.0.12 2019-01-04 [1] CRAN (R 4.1.3)
## pillar 1.7.0 2022-02-01 [1] CRAN (R 4.1.3)
## pkgbuild 1.3.1 2021-12-20 [1] CRAN (R 4.1.3)
## pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.1.3)
## pkgload 1.2.4 2021-11-30 [1] CRAN (R 4.1.3)
## plyr * 1.8.7 2022-03-24 [1] CRAN (R 4.1.3)
## png 0.1-7 2013-12-03 [1] CRAN (R 4.1.1)
## polyclip 1.10-0 2019-03-14 [1] CRAN (R 4.1.1)
## prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.1.3)
## processx 3.5.3 2022-03-25 [1] CRAN (R 4.1.3)
## promises 1.2.0.1 2021-02-11 [1] CRAN (R 4.1.3)
## proxy 0.4-26 2021-06-07 [1] CRAN (R 4.1.3)
## ps 1.6.0 2021-02-28 [1] CRAN (R 4.1.3)
## purrr * 0.3.4 2020-04-17 [1] CRAN (R 4.1.3)
## R6 2.5.1 2021-08-19 [1] CRAN (R 4.1.3)
## ranger 0.13.1 2021-07-14 [1] CRAN (R 4.1.3)
## RANN 2.6.1 2019-01-08 [1] CRAN (R 4.1.3)
## RBGL 1.70.0 2021-10-26 [1] Bioconductor
## RColorBrewer 1.1-3 2022-04-03 [1] CRAN (R 4.1.3)
## Rcpp 1.0.8.3 2022-03-17 [1] CRAN (R 4.1.3)
## RcppAnnoy 0.0.19 2021-07-30 [1] CRAN (R 4.1.3)
## RcppEigen 0.3.3.9.2 2022-04-08 [1] CRAN (R 4.1.3)
## RcppHNSW 0.3.0 2020-09-06 [1] CRAN (R 4.1.3)
## D RcppParallel 5.1.5 2022-01-05 [1] CRAN (R 4.1.3)
## RCurl 1.98-1.6 2022-02-08 [1] CRAN (R 4.1.2)
## readr * 2.1.2 2022-01-30 [1] CRAN (R 4.1.3)
## readxl 1.3.1 2019-03-13 [1] CRAN (R 4.1.3)
## remotes 2.4.2 2021-11-30 [1] CRAN (R 4.1.3)
## reprex 2.0.1 2021-08-05 [1] CRAN (R 4.1.3)
## reshape2 1.4.4 2020-04-09 [1] CRAN (R 4.1.3)
## Rgraphviz 2.38.0 2021-10-26 [1] Bioconductor
## rjson 0.2.21 2022-01-09 [1] CRAN (R 4.1.2)
## rlang 1.0.2 2022-03-04 [1] CRAN (R 4.1.3)
## rmarkdown 2.13 2022-03-10 [1] CRAN (R 4.1.3)
## robustbase 0.95-0 2022-04-02 [1] CRAN (R 4.1.3)
## Rphenograph * 0.99.1 2022-03-30 [1] Github (JinmiaoChenLab/Rphenograph@0298487)
## rprojroot 2.0.3 2022-04-02 [1] CRAN (R 4.1.3)
## RProtoBufLib 2.6.0 2021-10-26 [1] Bioconductor
## RSpectra 0.16-0 2019-12-01 [1] CRAN (R 4.1.3)
## rstatix 0.7.0 2021-02-13 [1] CRAN (R 4.1.3)
## rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.1.3)
## Rtsne 0.16 2022-04-17 [1] CRAN (R 4.1.3)
## rvest 1.0.2 2021-10-16 [1] CRAN (R 4.1.3)
## S4Vectors 0.32.3 2021-11-21 [1] Bioconductor
## sass 0.4.1 2022-03-23 [1] CRAN (R 4.1.3)
## scales 1.2.0 2022-04-13 [1] CRAN (R 4.1.3)
## scattermore 0.8 2022-02-14 [1] CRAN (R 4.1.3)
## scatterplot3d 0.3-41 2018-03-14 [1] CRAN (R 4.1.1)
## sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.1.3)
## shape 1.4.6 2021-05-19 [1] CRAN (R 4.1.1)
## shiny 1.7.1 2021-10-02 [1] CRAN (R 4.1.3)
## shinyFiles 0.9.1 2021-11-10 [1] CRAN (R 4.1.3)
## SingleCellExperiment 1.16.0 2021-10-26 [1] Bioconductor
## smoother 1.1 2015-04-16 [1] CRAN (R 4.1.3)
## sp 1.4-7 2022-04-20 [1] CRAN (R 4.1.3)
## stringi 1.7.6 2021-11-29 [1] CRAN (R 4.1.2)
## stringr * 1.4.0 2019-02-10 [1] CRAN (R 4.1.3)
## SummarizedExperiment 1.24.0 2021-10-26 [1] Bioconductor
## testthat 3.1.2 2022-01-20 [1] CRAN (R 4.1.3)
## tibble * 3.1.6 2021-11-07 [1] CRAN (R 4.1.3)
## tidyr * 1.2.0 2022-02-01 [1] CRAN (R 4.1.3)
## tidyselect 1.1.2 2022-02-21 [1] CRAN (R 4.1.3)
## tidyverse * 1.3.1 2021-04-15 [1] CRAN (R 4.1.3)
## TTR 0.24.3 2021-12-12 [1] CRAN (R 4.1.3)
## tweenr 1.0.2 2021-03-23 [1] CRAN (R 4.1.3)
## tzdb 0.2.0 2021-10-27 [1] CRAN (R 4.1.3)
## usethis * 2.1.5 2021-12-09 [1] CRAN (R 4.1.3)
## utf8 1.2.2 2021-07-24 [1] CRAN (R 4.1.3)
## uwot * 0.1.11 2021-12-02 [1] CRAN (R 4.1.3)
## vcd 1.4-9 2021-10-18 [1] CRAN (R 4.1.3)
## vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.1.3)
## vegan 2.6-2 2022-04-17 [1] CRAN (R 4.1.3)
## VGAM 1.1-6 2022-02-14 [1] CRAN (R 4.1.3)
## VIM 6.1.1 2021-07-22 [1] CRAN (R 4.1.3)
## withr 2.5.0 2022-03-03 [1] CRAN (R 4.1.3)
## xfun 0.30 2022-03-02 [1] CRAN (R 4.1.3)
## XML 3.99-0.9 2022-02-24 [1] CRAN (R 4.1.2)
## xml2 1.3.3 2021-11-30 [1] CRAN (R 4.1.3)
## xtable 1.8-4 2019-04-21 [1] CRAN (R 4.1.3)
## xts 0.12.1 2020-09-09 [1] CRAN (R 4.1.3)
## XVector 0.34.0 2021-10-26 [1] Bioconductor
## yaml 2.3.5 2022-02-21 [1] CRAN (R 4.1.2)
## zlibbioc 1.40.0 2021-10-26 [1] Bioconductor
## zoo 1.8-10 2022-04-15 [1] CRAN (R 4.1.3)
##
## [1] D:/Programme/R/R-4.1.3/library
##
## D -- DLL MD5 mismatch, broken installation.
##
## ------------------------------------------------------------------------------